An application of many-facet Rasch measurement to validate the numeracy test for elementary students
##plugins.themes.bootstrap3.article.main##
Abstract
Numeracy skills are essential for students' academic achievement and everyday decision-making; however, appropriate evaluation instruments are lacking. The main objective of this study was to investigate the psychometric characteristics of a numeracy test consisting of 16 items (12 multiple-choice and four essays), which were evaluated by 12 expert raters. This study utilized the Many-Facet Rasch Measurement (MFRM) to examine item difficulty, rater severity, and participant ability, thus providing an in-depth assessment of the validity and reliability of the test. The findings showed that all 16 items fit the Rasch model, exhibiting appropriate difficulty levels and ensuring that the test effectively differentiated participants' diverse levels of numeracy ability. In addition, the study demonstrated a uniform rater performance, thereby increasing the dependability of the evaluation. This study highlights the need for modern psychometric techniques in educational evaluation to create more effective instruments for assessing numeracy in mathematics education. This study promotes mathematical assessment and offers a basis for future research to improve educational measurement techniques.
##plugins.themes.bootstrap3.article.details##

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The author is responsible for acquiring the permission(s) to reproduce any copyrighted figures, tables, data, or text that are being used in the submitted paper. Authors should note that text quotations of more than 250 words from a published or copyrighted work will require grant of permission from the original publisher to reprint. The written permission letter(s) must be submitted together with the manuscript.References
Alghodaier, H., Jradi, H., Mohammad, N. S., & Bawazir, A. (2017). Validation of a diabetes numeracy test in Arabic. PLoS One, 12(5), e0175442. https://doi.org/10.1371/journal.pone.0175442
Andrich, D. (2011). Rating scale analysis with Rasch measurement. Rasch measurement transactions, 25(1), 1313–1314.
Arens, A. K., & Hasselhorn, M. (2015). Differentiation of competence and affect self-perceptions in elementary school students: extending empirical evidence. European Journal of Psychology of Education, 30(4), 405–419. https://doi.org/10.1007/s10212-015-0247-8
Assaraf, O. B. Z., & Orion, N. (2009). System thinking skills at the elementary school level. Journal of Research in Science Teaching, 47(5), 540–563. https://doi.org/10.1002/tea.20351
Bailes, L. P., & Nandakumar, R. (2020). Get the most from your survey: An application of rasch analysis for education leaders. International Journal of Education Policy and Leadership, 16(2), 1–19. https://doi.org/10.22230/ijepl.2020v16n2a857
Bell, A., & Jones, K. (2015). Explaining fixed effects: Random effects modeling of time-series cross-sectional and panel data. Political Science Research and Methods, 3(1), 133–153. https://doi.org/10.1017/psrm.2014.7
Bond, T. G., & Fox, C. M. (2015). Applying the rasch model: Fundamental measurement in the human sciences (3rd ed.). Psychology Press. https://doi.org/10.4324/9781410614575
Boone, W. J., & Scantlebury, K. (2006). The role of rasch analysis when conducting science education research utilizing multiple-choice tests. Science Education, 90(2), 253–269. https://doi.org/10.1002/sce.20106
Boone, W. J., Staver, J. R., & Yale, M. S. (2014). Rasch analysis in the human sciences. Springer Dordrecht. https://doi.org/10.1007/978-94-007-6857-4
Boone, W. J., Townsend, J. S., & Staver, J. (2010). Using rasch theory to guide the practice of survey development and survey data analysis in science education and to inform science reform efforts: An exemplar utilizing STEBI self‐efficacy data. Science Education, 95(2), 258–280. https://doi.org/10.1002/sce.20413
Boone, W. J., Townsend, J. S., & Staver, J. R. (2015). Utilizing multifaceted rasch measurement through FACETS to evaluate science education data sets composed of judges, respondents, and rating scale items: An exemplar utilizing the elementary science teaching analysis matrix instrument. Science Education, 100(2), 221–238. https://doi.org/10.1002/sce.21210
Buljan, I., Tokalić, R., Marušić, M., & Marušić, A. (2019). Health numeracy skills of medical students:cross-sectional and controlled before-and-after study. BMC Medical Education, 19(1), 467. https://doi.org/10.1186/s12909-019-1902-6
Eckes, T. (2019). Many-facet Rasch measurement: Implications for rater-mediated language assessment. In V. Aryadoust & M. Raquel (Eds.), Quantitative data analysis for language assessment volume I (pp. 153–175). Routledge.
Engelhard, G., & Wind, S. A. (2017). Invariant measurement with raters and rating scales. Routledge. https://doi.org/10.4324/9781315766829
Getenet, S. T. (2022). Teachers' knowledge framework for designing numeracy rich tasks across non-mathematics curriculum areas. International Journal of Education in Mathematics, Science and Technology, 10(3), 663–680. https://doi.org/10.46328/ijemst.2137
Hart, S. A., Ganley, C. M., & Purpura, D. J. (2016). Understanding the home math environment and its role in predicting parent report of children’s math skills. PLoS One, 11(12), e0168227. https://doi.org/10.1371/journal.pone.0168227
He, P., Zhai, X., Shin, N., & Krajcik, J. (2023). Applying rasch measurement to assess knowledge-in-use in science education. In X. Liu & W. J. Boone (Eds.), Advances in Applications of Rasch Measurement in Science Education (pp. 315–347). Springer International Publishing. https://doi.org/10.1007/978-3-031-28776-3_13
Ichikowitz, K., Bruce, C., Meitanis, V., Cheung, K., Kim, Y., Talbourdet, E., & Newton, C. (2023). Which blueberries are better value? The development and validation of the functional numeracy assessment for adults with aphasia. International Journal of Language & Communication Disorders, 58(4), 1294–1315. https://doi.org/10.1111/1460-6984.12867
Iramaneerat, C., Yudkowsky, R., Myford, C. M., & Downing, S. M. (2007). Quality control of an OSCE using generalizability theory and many-faceted Rasch measurement. Advances in Health Sciences Education, 13(4), 479–493. https://doi.org/10.1007/s10459-007-9060-8
Jordan, N. C., Kaplan, D., Ramineni, C., & Locuniak, M. N. (2009). Early math matters: Kindergarten number competence and later mathematics outcomes. Developmental Psychology, 45(3), 850–867. https://doi.org/10.1037/a0014939
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000
Krajewski, K., & Schneider, W. (2009). Early development of quantity to number-word linkage as a precursor of mathematical school achievement and mathematical difficulties: Findings from a four-year longitudinal study. Learning and Instruction, 19(6), 513–526. https://doi.org/10.1016/j.learninstruc.2008.10.002
Kudiya, K., Sumintono, B., Sabana, S., & Sachari, A. (2018). Batik Artisans’ judgment of Batik wax quality and its criteria: An application of the many-facets rasch model. In Q. Zhang (Ed.), Pacific Rim Objective Measurement Symposium (PROMS) 2016 Conference Proceedings (pp. 27–37). https://doi.org/10.1007/978-981-10-8138-5_3
Linacre, J. M. (1989). Many-faceted rasch measurement. MESA Press.
Linacre, J. M. (2009). Reasonable mean-square fit values. Rasch measurement transactions, 23(2), 1206.
Long, C., Wendt, H., & Dunne, T. (2011). Applying rasch measurement in mathematics education research: steps towards a triangulated investigation into proficiency in the multiplicative conceptual field. Educational Research and Evaluation, 17(5), 387–407. https://doi.org/10.1080/13803611.2011.632661
McNaughton, C. D., Collins, S. P., Kripalani, S., Rothman, R., Self, W. H., Jenkins, C., Miller, K., Arbogast, P., Naftilan, A., Dittus, R. S., & Storrow, A. B. (2013). Low numeracy is associated with increased odds of 30-day emergency department or hospital recidivism for patients with acute heart failure. Circulation: Heart Failure, 6(1), 40–46. https://doi.org/10.1161/circheartfailure.112.969477
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research & Perspective, 1(1), 3–62. https://doi.org/10.1207/s15366359mea0101_02
Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386–422.
Nam, S. K., Yang, E., Lee, S. M., Lee, S. H., & Seol, H. (2010). A psychometric evaluation of the career decision self-efficacy scale with Korean students: A rasch model approach. Journal of Career Development, 38(2), 147–166. https://doi.org/10.1177/0894845310371374
Nguyen, T. H., Park, H., Han, H.-R., Chan, K. S., Paasche-Orlow, M. K., Haun, J., & Kim, M. T. (2015). State of the science of health literacy measures: Validity implications for minority populations. Patient Education and Counseling, 98(12), 1492–1512. https://doi.org/10.1016/j.pec.2015.07.013
OECD. (2019). Skills matter: Additional results from the survey of adult skills. OECD Publishing. https://doi.org/10.1787/1f029d8f-en
O'Meara, N., O'Sullivan, K., Hoogland, K., & Diez-Palomer, J. (2024). European study investigating adult numeracy education. European Journal for Research on the Education and Learning of Adults, 15(2), 105–121. https://doi.org/10.3384/rela.2000-7426.4833
Parra-López, E., & Oreja-Rodríguez, J. R. (2014). Evaluation of the competiveness of tourist zones of an island destination: An application of a many-facet rasch model (MFRM). Journal of Destination Marketing & Management, 3(2), 114–121. https://doi.org/10.1016/j.jdmm.2013.12.007
Primi, R., Silvia, P. J., Jauk, E., & Benedek, M. (2019). Applying many-facet rasch modeling in the assessment of creativity. Psychology of Aesthetics, Creativity, and the Arts, 13(2), 176–186. https://doi.org/10.1037/aca0000230
Purnomo, H., Sa’dijah, C., Hidayanto, E., Sisworo, S., Permadi, H., & Anwar, L. (2022). Development of instrument numeracy skills test of minimum competency assessment (MCA) in Indonesia. International Journal of Instruction, 15(3), 635–648. https://doi.org/10.29333/iji.2022.15335a
Sondergeld, T. A., & Johnson, C. C. (2014). Using rasch measurement for the development and use of affective assessments in science education research. Science Education, 98(4), 581–613. https://doi.org/10.1002/sce.21118
Steen, L. A. (2001). Mathematics and numeracy: Two literacies, one language. The Mathematics Educator, 6(1), 10–16.
Vaughan, B., Mulcahy, J., & McLaughlin, P. (2014). The DREEM, part 2: psychometric properties in an osteopathic student population. BMC Medical Education, 14(1), 100. https://doi.org/10.1186/1472-6920-14-100
Weller, J. A., Dieckmann, N. F., Tusler, M., Mertz, C. K., Burns, W. J., & Peters, E. (2012). Development and testing of an abbreviated numeracy scale: A rasch analysis approach. Journal of Behavioral Decision Making, 26(2), 198–212. https://doi.org/10.1002/bdm.1751
Wind, S. A., & Engelhard, G. (2013). Exploring rater effects in performance assessments: A multilevel approach. Educational and Psychological Measurement, 73(3), 447–470.
Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Meas Transac, 8(3), 370. https://cir.nii.ac.jp/crid/1370848662556581767